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Since the advent of whole-genome sequencing, transposable elements (TEs), just thought to be 'junk' DNA, have been 
noticed because of their numerous copies in various eukaryotic genomes. Many studies about TEs have been conducted to 
discover their functions in their host genomes. Based on the results of those studies, it has been generally accepted that they 
have a function to cause genomic and genetic variations. However, their infinite functions are not fully elucidated. Through 
various mechanisms, including de novo TE insertions, TE insertion-mediated deletions, and recombination events, they 
manipulate their host genomes. In this review, we focus on Alu, L1, human endogenous retrovirus, and short interspersed 
element/variable number of tandem repeats/A/u (SVA) elements and discuss how they have affected primate genomes, 
especially the human and chimpanzee genomes, since their divergence. 
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Introduction 

Transposable elements (TEs), mobile segments of genetic 
material, were first discovered by McClintock [1]. Since 
then, they have been identified in a variety of eukaryotes [2] . 
Recent genome sequencing projects have consistently shown 
that TEs make up ~50% of primate genomes, while coding 
DNA occupies only ~2% of the genomes [3-5]. TEs are 
generally divided into two categories, DNA transposons and 
retrotransposons (Fig. 1), based on their manner of mobili- 
zation. DNA transposons move using a cut-and-paste me- 
chanism [6] . In contrast, retrotransposons move in a copy- 
and-paste fashion by duplicating the element into a new 
genomic location via an RNA intermediate [7]. Thus, retro- 
transposons increase their copy number more rapidly than 
DNA transposons. 

Retrotranspons include short interspersed element (SINE), 
long interspersed element (LINE), and human endogenous 
retrovirus (HERV). Alu and short interspersed element/ 
variable number of tandem repeats/Alu (SVA) elements are 
primate-specific retrotransposons, and their full-length is 
300 bp and 2 kb, respectively. The Alu element is the most 
successful SINE in terms of its copy number; ~ 1.2 million 
Alu copies exist in the human genome. LINE is ~6 kb in 
length, and thus, it is much longer than the SINEs. This 



element has two open reading frames encoding enzymatic 
machineries essential for the propagation of the three 
elements; the Alu element depends on reverse transcriptase 
of LINE for making their dispersed copies in the host 
genome [8]. In contrast to SINE and LINE, which do not 
have long terminal repeats (LTRs), the full-length HERV (~ 
10 kb) has two LTRs, and three genes-gag, pol, and env-are 
located between them [8, 9] . 

Studies on active TEs have suggested that the elements 
could alter gene expression by providing as-regulatory 
elements, such as promoters, enhancers, and transcription 
factor binding sites [10]. Through these mechanisms, altered 
transcriptional activity could lead to dysfunctional and 
abnormal proteins. Through de novo TE insertion within a 
gene, TEs could alter a gene product, which could be either 
harmful or beneficial to its host genome [11, 12]. In cases 
where an inserted TE causes a harmful effect on its host 
genome, the TE is likely to go to inactivation and fossiliza- 
tion by evolutionary accumulation of mutations and silen- 
cing effects [13]. 

Since the divergence of the human and chimpanzee, ~6 
million years ago, many TEs have propagated in each 
genome. Among them are the Alu, LI, SVA, and HERV-K 
(HML-2) elements. During the past 6 million years, 5,530 
Alu, 1,835 LI, 864 SVA, and 1 13 HERV-K (HML-2) elements 
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Fig. 1. Structures of transposable elements. These elements could be categorized into retrotransposons {Alu, long interspersed element 
[LINE], and human endogenous retrovirus [HERV]) (A) and DNA transposons (e.g., MARINER) (B) based on their manner of mobilization. 
In addition, autonomous elements (e.g., HERV and LINE) have coding genes responsible for their own mobilization but also other 
nonautonomous elements (e.g., Alu and short interspersed element/variable number of tandem repeats/A/u [SVA]). Alu consists of two 
monomers separated by an A-rich connector, one of which, the left monomer, includes internal RNA polymerase III promoter (A and 
B boxes). Full-length of LINE is ~6 kb and has open reading frames (ORFs) encoding RNA-binding protein, endonuclease, and 
reverse-transcriptase, which are flanked by untraslated regions (UTRs). ORF1 and ORF2 are separated by an ~60-bp-long intergenic spacer 
(IS). SVA contains a (CCCTCT)„ hexamer, A/u-like sequences, variable number of tandem repeat (VNTR), and short interspersed element-R 
(SINE-R). An arrow on A/u-like sequences indicates the direction of Alu. HERV has gag, prt, pol, and env genes flanked by a long terminal 
repeat (LTR), which encodes capsid protein, protease, polymerase, and envelop protein, respectively, used in viral infection. As an example 
of DNA transposon, mariner has a gene encoding transposase with a DNA-binding domain and catalytic domain flanked by an inverted 
repeat (IR). All elements are flanked by target site duplication (TSD) through integration. DDE, the conserved DDE sequence of the mariner 
transposase; NLS, nuclear localization signal. 



are estimated to have been newly inserted in the human 
genome (Table 1) [8, 14, 15] . These elements could act as an 
agent causing human-specific genomic rearrangements via 
de novo TE insertions, TE insertion-mediated deletions, and 
homologous recombination events [12, 16-19]. Furthermore, 
some of the recent integrated TEs are capable of producing 
new copies in the human genome. These de novo TE inser- 
tions have the potential to cause a genomic difference among 
human populations and even human individuals, which 
could be related to human phenotypes and diseases [20] . 
In this review, we describe species-specific TEs and discuss 



how they affect their host genomes, focusing on illustrating 
the mechanisms that they utilize with examples. Taken 
together, we suggest that TEs, often called 'junk' DNA, in 
fact have many functions and play a significant and dynamic 
role in primate genomic evolution. 

TEs Recently Inserted into the Human Genome 

Comparative genomics allows us to investigate species- 
specific TEs. Through a combinational method of compu- 
tational data mining and experimental verification, species- 
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Table 1. Genomic rearrangement associated with active transpos- 
able elements (TEs) in human genome 



Mechanism 


Type of TE 


No. of events 
(deletion size, bp) 


References 


cfe novo insertion 


A/u 


5,530 


[22] 




L1 


1,835 


[15] 




SVA 


864 


[22] 




HERV-K 


113 


[30] 


Recombination- 


Alu-Alu 


492 (396,420) 


[16] 


mediated deletion 


L1-L1 


73 (447,567) 


[24] 




SVA-SVA 


1 (589) 


[33] 


Insertion-mediated 


Alu 


23 (11,206) 


[17, 31] 


deletion 


L1 


31 (22,873) 


[18, 32] 




SVA 


13 (30,785) 


[33] 



specific TEs have been well studied in various primate 
genomes, including human, chimpanzee, and rhesus maca- 
que. During the past 6 million years, more than 10,000 
human-specific TEs were newly inserted in the human [21]. 
The majority of the identified elements belong to Alu, LI, 
and SVA retrotransposons. These elements have contributed 
to the genomic difference between human and non-human 
primates through insertion and post-insertion recombina- 
tion between the elements [8, 22] . 

Alu elements are most ubiquitous in the human genome. 
A master gene model has been generally accepted to explain 
the amplification of Alu elements. In the model, only limited 
numbers of hyperactive "source" or "master" genes are able 
to produce a high number of Alu copies over evolutionary 
time. One of the previous studies about the AluYb subfamily, 
one of the most active Alu subfamilies in the human genome, 
found that the copy number of AluYb elements is very 
different between human and non-human primates; a high 
copy number of AluYb elements exists in the human, but a 
very small number of the elements exists in non-human 
primates [23] . The oldest AluYb element resides at its ortho- 
logous position in all hominid primate genomes, demonstra- 
ting that the AluYb subfamily emerged 1 8 to 25 million years 
ago. Then, after approximately 20 million years of retrotrans- 
positional quiescence, a major expansion of the subfamily 
occurred only in the human genome within the past several 
million years. To explain their successful proliferation in the 
human genome, a new model, "stealth driver," was intro- 
duced. In this model, a high copy of the Alu elements could 
be driven at least in part by "stealth driver" elements, which 
maintain low retrotransposition activity over extended peri- 
ods of time. Although this element has low retrotranspo- 
sition activity, they have the potential to produce short-lived 
hyperactive copies responsible for the remarkable expansion 
of AluYb elements within the human genome [24]. 

LI is another successful element, occupying ~17% of the 



human genome. The copy number of Alu elements is much 
higher than that of LI (Alu, ~ 1.2 million copies; LI, ~520,000 
copies) . Nonetheless, Alu elements are responsible for ~ 1 1% 
of the human genome, because the Alu element is much 
shorter than LI (Alu average length, 300 bp; full-length of 
LINE, 6 kb) [25]. Through characterization of sequence 
diversity of chimpanzee-specific LI subfamilies as compared 
to their human-specific counterparts, it was concluded that 
Lis experienced different evolutionary fates between 
humans and chimpanzees within the past ~6 million years. 
Although the species-specific LI copy numbers were on the 
same order in the two species (1,800 human-specific Lis vs. 
1,200 chimpanzee-specific Lis), the number of 
retrotransposition- competent elements was much higher in 
the human genome than in the chimpanzee genome. The 
species-specific Lis were grouped into several LI 
subfamilies. All human LI subfamilies belonged to a single 
lineage, but two distinct LI lineages were identified in the 
chimpanzee genome [15]. 

SVA is shared only in human and apes. Fewer than 1,000 
copy numbers were detected in orangutan (~15 million 
years ago), and no SVA detected in Old World monkeys 
indicating SVA is a hominid-specific element. Like Alu ele- 
ments, SVA elements retrotranspose to another locus in trans 
by using reverse-transcriptase encoded by Lis [8]. Due to 
the limited mobilization and short evolutionary time, the 
copy number of SVA is very small compared to Alu and LI 
elements [15, 26, 27]. There are six SVA subfamilies (SVA_A 
to SVA_F) in the human genome. The older, SVA_A to SVA_ 
D, evolved in a single lineage, whereas human-specific 
SVA_E and SVA_F were derived independently from their 
ancestral sequences. The copy number of SVA was estimated 
throughout the primate genomes of human, chimpanzee, 
and gorilla, and there was no significant difference among 
them. The two elements, Alu and LI, showed a huge 
expansion at a specific evolutionary time along the primate 
lineage, but SVA still did not show any burst in its copy 
number [26, 27]. 

Genomic Rearrangements by TEs 

The comparison of human and chimpanzee genomic 
sequences showed that the two genomes have a much higher 
sequence identity than we expected [3, 4] . In spite of the 
sequence similarity, TEs have remarkably generated geno- 
mic differences between the two species since their diver- 
gence [22] . Many studies have suggested that a number of 
TEs are still active to retrotranspose and have the potential 
to cause genomic variations as a major driver [12, 16-19, 28] . 
In reality, TEs have rearranged human and non-human 
primate genomes through various mechanisms, such as de 
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Fig. 2. Schematic representation of genomic rearrangement and gene expression alteration by transposable elements (TEs) in host genome. 
(A) Classical TE insertion by recognizing 5'-TTAAA-3', (B) non-classical TE insertion, (C) nonallelic homologous recombination 
(NAHR)-mediated deletion, (D) nonhomologous end-joining (NHEJ)-mediated deletion, (E) mechanism of gene expression alteration by 
TEs integrated into the host gene. Depending on location of insertion in the host gene, TEs could generate alternative transcripts or disrupt 
the expression. ORF, open reading frame; Grey and pink arrow boxes, target site duplication; black line, flanking region; grey line, intervening 
region; dotted circles, homologous recombination regions; pink boxes, microhomology region. 



novo TE insertion, TE insertion-mediated deletion, and 
homologous recombination between them (Fig. 2) [29]. 
These genomic changes caused by TEs have increased the 
genomic difference between human and non-human pri- 
mates, and some of the human-specific genomic rearrange- 
ments caused human diseases [28, 34] . 

Advanced sequencing technology, including next-genera- 
tion sequencing, and combined computational analyses have 



accelerated the studies on the dynamics of TE mobilization 
[35]. In reality, human-specific TEs have been continuously 
investigated, and the majority of them are Alu, LI, and SVA 
elements [8, 14]. The relationship between human brain 
evolution and Alu elements was studied. Since the diver- 
gence of the human and chimpanzee lineages, the human 
brain has rapidly changed in terms of mass [36] . It is not an 
exaggeration to say that Alu elements are in part responsible 
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for the human brain mass. Interestingly, de novo Alu inser- 
tions have been identified in many human brain genes that 
are related to neuronal functions and neurological disorders 
[37] . The inserted Alu elements belong to Alu Ya5, AluYbS, 
and AiuYcl, which are human-specific Alu subfamilies [37]. 

Approximately 1,800 human-specific Lis were identified 
in the human genome [15]. They belonged to two different 
subfamilies, pre-Ta and Ta; Ta is subdivided into Ta-0 and 
Ta-1 by diagnostic nucleotides [38]. Among hominid-specific 
SVA subfamilies, SVA_E and SVA_F are only detected in the 
human genome, but the other four subfamilies, SVA_A, 
SVA_B, SVA_C, and SVA_D, are shared in human and other 
apes, including chimpanzee and gorilla [26] . HERV appeared 
in the primate genome through germ-line infection [30]. 
There are approximately 98,000 HERVs in the human 
genome. Full-length HERVs are ~ 10 kb in length, but most 
of the HERVS existing in the human are defective due to 
truncation and accumulation of mutations during primate 
evolution [39] . Among various HERV subfamilies, HERV-K 
(HML2) is the youngest element in the human genome [8, 
40]; 113 human-specific HERV-Ks were identified in the 
human genome, and among them, there were 15 and 98 
full-length HERV-Ks and solitary LTRs, respectively [39] . 
These de novo TE insertions showed polymorphisms among 
human populations and even human individuals [31, 39, 41, 
42] . Therefore, they have the potential to be used as a genetic 
marker for racial identification [31, 41, 42]. 

De novo TE insertions contribute to the genome expan- 
sion. Actually, some of them somewhat decreased their host 
genomes involving the insertion-mediated deletion of host 
genome sequences [32, 33] . Through comparative genomic 
analyses, 50 LI insertion-mediated deletion events were 
found in the human and chimpanzee genomes [18]. The sizes 
of the deleted sequences were variable, and in sum, ~ 18 kb 
and ~ 15 kb of sequences were removed from the human and 
chimpanzee genomes, respectively. Based on the result, it 
was estimated that LI insertions may have deleted up to 7.5 
Mb of target genomic sequences during the primate radia- 
tion. Alu insertions were also involved in the genomic dele- 
tions at its insertion target regions through Alu retrotrans- 
position-mediated deletion. A total of 33 deletion events 
responsible for a ~9,000-bp deletion in human and chim- 
panzee genomes were identified. It was suggested that Alu 
retrotransposition may have contributed to over 3,000 dele- 
tion events, leading to a ~900-kb deletion during primate 
evolution [17]. Additionally, 13 SVA insertion-mediated 
deletions (SIMDs) were also identified in the human 
genome, and they deleted 30,785 bp of the human genome 
compared with the chimpanzee genome (Table 1). Among 
the 13 SIMDs, 9 were associated with the SVA_D subfamily, 
occupying the largest portion of SVAs, which suggests that 



SIMD frequency is directly correlated to the copy number of 
SVA elements. Furthermore, one of the deletion events 
occurred in the tMDC II gene associated with sperm-egg 
binding prior to fertilization [43, 44] . 

After TE insertions into the host genome, they could 
generate genomic variations through unequal homologous 
recombination events between them [17, 19, 43]. The copy 
number of TEs is closely related to the frequency of the 
recombination between them. Thus, compared to other TEs, 
Alu and LI elements have a high probability of generating 
genomic structural variations due to their ubiquity. In reality, 
492 Alu recombination-mediated deletions (ARMDs) were 
identified in the human genome, and they deleted ~400 kb 
of human genomic sequences (Table 1). About 60% of the 
deletion events were related to known or predicted genes, 
including three that deleted functional exons. Thus, the 
ARMD process has produced a considerable portion of the 
genomic and phenotypic variations between humans and 
chimpanzees since the divergence of the two species [16]. 
The recombination between LI elements has also deleted 
human genome sequences. Seventy- three LI recombination- 
associated deletions (LIRADs) were identified in the human 
genome [45] . The sizes of the deletion events range from 56 
to 64, 1 13 bp, and ~450 kb of human genomic sequence was 
deleted through this L1RAD process (Table 1). Thus, the 
LI RAD event has deleted 25 times as much human genomic 
sequence as the LI insertion-mediated deletion event [18, 
45]. 

Genomic Instability Generated by TEs 

The TEs inserted in intra- and inter-genic regions could 
alter cellular gene expression, increasing genomic instability 
[9] . About a decade ago, gene regulation by TEs was studied 
only in specific genes through experimental validation. How- 
ever, genome-wide analyses of gene regulation by TEs were 
recently conducted due to the developed high-throughput 
technologies. The findings showed that TEs have many 
regulatory sequences, such as promoters, enhancers, poly- 
adenylation signals, and cryptic splicing donor (5') and 
acceptor (3') sites, by which the transcript architecture of 
nearby genes can be altered [10, 46, 47]. 

When TEs insert into the intronic region of genes, they 
could create a new exon by offering splicing sites, and this 
process is called "exonization" [48]. This mechanism is 
related to exon variations, such as cassette exons and intron 
retention in exons, increasing mRNA instability [49] . The 
TEs residing upstream of any gene could act as an alternative 
promoter, leading to new alternative transcripts with a new 
transcriptional start site [10]. Some TEs carry bidirectional 
promoters with transcription factor binding sites. For 



www.genominfo.org 



Genomics & Informatics Vol. 10, No. 4, 2012 



example, LTR and LI have sense promoters initiating their 
transcription and antisense promoters having the potential 
to initiate the transcription of other genes in the opposite 
direction [50, 51]. There is a microRNA gene cluster in 
human chromosome 19 (C19MC), over 100 kb in length. 
This cluster consists of the duplication of a core cassette, 
including a minus-strand Alu element. The cluster grew 
successfully during primate evolution, and the Alu element 
promotes microRNA expression by RNA polymerase III 
[52] . TEs not only initiate transcription but also terminate it 
by offering a polyadenylation signal. A full-length LI con- 
tains 19 polyadenylation signals that could cause premature 
mRNA truncation [53] . The genes that contain TEs in their 
genie region have a tendency to produce various transcript 
forms, causing transcriptome diversity [10]. 

The orientation of TEs could be a factor affecting gene 
expression, which is well described by a "head-on collision" 
hypothesis. During DNA replication, DNA polymerase colli- 
des with RNA polymerase transcription complexes moving 
in the opposite direction to the movement of the DNA 
polymerase. It was observed that the collision slows down 
the DNA replication [54] . In cases where active TE exists in 
the opposite direction to its nearby genes, an RNA poly- 
merase transcription complex transcribing the TE could 
encounter any of the RNA polymerase transcription com- 
plexes transcribing nearby genes, which could reduce the 
expression of the gene [54, 55]. In reality, transcription of 
the E-globin gene is repressed by an Alu element that has 
been inserted in the opposite direction to the gene [56] . On 
the other hand, TEs inserting in the same direction to its 
nearby genes show no effect on the expression of the genes 
[55]. 

Histone modification plays an important role in gene 
transcriptional regulation, and through this process, the 
host genome could regulate the activation of TEs [57, 58] . In 
reality, most TEs are accompanied by repressive histone 
modifications (e.g., H3K9me2 and H3K27me3), which cause 
the formation of heterochromatin. Conversely, TEs could 
affect the expression of host genes through histone modifi- 
cations [59, 60] . The level of histone modifications was 
calculated in all families of human TEs, and older TE families 
carried more histone modifications than younger families. 
Interestingly, TEs proximal to genes carry more histone 
modifications than the ones that are distal to genes, which 
suggests that some epigenetic modifications of TEs may 
serve to regulate the expression of host genes [61]. 

DNA methylation is a strict silencing mechanism, and the 
host genome could use this process to repress the activation 
of TEs [61]. In general, DNA methylation occurs in a CpG 
dinucleotide. Because Alu and SVA elements have a high 
degree of CpG dinucleotides, they are vulnerable to methy- 



lation [26, 62] . It was observed that TEs regain their activity 
to mobilize and regulate the expression of host genes when 
the silencing effect becomes slackened with increasing 
genomic instability. In addition, the demethylation of TEs is 
associated with human diseases, commonly in cancer 
[63-65]. 

miRNAs, one of the most active factors regulating gene 
expression, could be derived from TEs. Fifty-five genes 
derived from TEs were identified in the human genome, and 
their characterization showed that TE-derived miRNAs 
could potentially regulate the complex and dynamics of 
human genes [66] . 

Conclusions 

TEs have shown a variety of impacts on their host 
genomes. In this review, we describe HERV, Alu, LI, and SVA 
elements, which are thought to still be active in the human 
genome. A number of research studies related to TEs have 
shed new light on their amplification mechanisms and their 
function in primate genomes. Furthermore, recent research 
of TEs in the rhesus macaque genome provides a glimpse 
into their diversity and strong influence on the overall 
differences in genomic architecture between the Old World 
monkey (e.g., rhesus macaque) and hominid (e.g., human 
and ape) lineages [67] . The occurrence of de novo TE inser- 
tions, TE insertion-mediated deletions, and post-insertion 
recombination between TEs within the human and chim- 
panzee lineages has caused genetic alteration, lineage-specific 
genomic rearrangements, and phenotypic variations, further 
contributing to the divergence of humans and chimpanzees. 
As a whole, this review calls into question whether TEs 
should be considered "junk" DNA at all. Rather, TEs repre- 
sent a potent evolutionary force associated with genomic 
fluidity in their host genomes. 
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